Training Better CNNs Requires to Rethink ReLU

نویسندگان

Gangming Zhao

Zhaoxiang Zhang

Jingdong Wang

He Guan

چکیده

With the rapid development of Deep Convolutional Neural Networks (DCNNs), numerous works focus on designing better network architectures (i.e., AlexNet, VGG, Inception, ResNet and DenseNet etc.). Nevertheless, all these networks have the same characteristic: each convolutional layer is followed by an activation layer, a Rectified Linear Unit (ReLU) layer is the most used among them. In this work, we argue that the paired module with 1:1 convolution and ReLU ratio is not the best choice since it may result in poor generalization ability. Thus, we try to investigate the more suitable convolution and ReLU ratio for exploring the better network architectures. Specifically, inspired by Leaky ReLU, we focus on adopting the proportional module with N:M (N>M) convolution and ReLU ratio to design the better networks. From the perspective of ensemble learning, Leaky ReLU can be considered as an ensemble of networks with different convolution and ReLU ratio. We find that the proportional module with N:M (N>M) convolution and ReLU ratio can help networks acquire the better performance, through the analysis of a simple Leaky ReLU model. By utilizing the proportional module with N:M (N>M) convolution and ReLU ratio, many popular networks can form more rich representations in models, since the N:M (N>M) proportional module can utilize information more effectively. Furthermore, we apply this module in diverse DCNN models to explore whether is the N:M (N>M) convolution and ReLU ratio indeed more effective. From our experimental results, we can find that such a simple yet effective method achieves better performance in different benchmarks with various network architectures and the experimental results verify that the superiority of the proportional module. In addition, to our knowledge, it is the first time to introduced the proportional module in DCNN models. We think that our proposed method can help many researchers design the better network architectures. Introduction Nowadays, with the available of large scale image datasets eg ImageNet (Russakovsky et al. 2015)) as well as high performance computing resources eg GPU, deep Convolutional Neural Networks (CNNs) (LeCun et al. 1998) have been dominant in many computer vision applications, especially for image classification (Krizhevsky, Sutskever, and Hinton ∗Corresponding author. ([email protected]) Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. f (y)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels. We show that when the inputs follow Gaussian distribution and the sample size is sufficiently large, the squared loss of such CNNs is locally strongly convex in a basin of attraction near the global optima for most popular activation functions, like ReLU, Leaky ReLU, Squ...

متن کامل

Phone recognition with hierarchical convolutional deep maxout networks

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refineme...

متن کامل

Investigation of parametric rectified linear units for noise robust speech recognition

Convolutional neural networks with rectified linear unit (ReLU) have been successful in speech recognition and computer vision tasks. ReLU was proposed as a better match to biological neural activation functions compared to sigmoidal non-linearity function. However, ReLU has a disadvantage that the gradient is zero whenever the unit is not active or saturated. To alleviate the potential problem...

متن کامل

CNNs are Globally Optimal Given Multi-Layer Support

Stochastic Gradient Descent (SGD) is the central workhorse for training modern CNNs. Although giving impressive empirical performance it can be slow to converge. In this paper we explore a novel strategy for training a CNN using an alternation strategy that offers substantial speedups during training. We make the following contributions: (i) replace the ReLU non-linearity within a CNN with posi...

متن کامل

EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks

For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component accompanied by each layer. Although ReLU can ease the network training to an extent, the character of blocking negative values may suppress the propagation of useful information and leads to the difficulty of optimizing very deep Convolutional Neural Networks (CNNs). Moreover, stacking of layers w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1709.06247 شماره

صفحات -

تاریخ انتشار 2017

Training Better CNNs Requires to Rethink ReLU

نویسندگان

چکیده

منابع مشابه

Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

Phone recognition with hierarchical convolutional deep maxout networks

Investigation of parametric rectified linear units for noise robust speech recognition

CNNs are Globally Optimal Given Multi-Layer Support

EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks

عنوان ژورنال:

اشتراک گذاری